-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-23.1: sql: update connExecutor logic for pausable portals #101026
Merged
ZhouXing19
merged 8 commits into
cockroachdb:release-23.1
from
ZhouXing19:backport-map-0407
Apr 10, 2023
Merged
release-23.1: sql: update connExecutor logic for pausable portals #101026
ZhouXing19
merged 8 commits into
cockroachdb:release-23.1
from
ZhouXing19:backport-map-0407
Apr 10, 2023
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
With the introduction of pausable portals, the comment for `limitedCommandResult` needs to be updated to reflect the current behavior. Release note: None
This change introduces a new session variable for a preview feature. When set to `true`, all non-internal portals with read-only [`SELECT`](../v23.1/selection-queries.html) queries without sub-queries or post-queries can be paused and resumed in an interleaving manner, but are executed with a local plan. Release note (SQL change): Added the session variable `multiple_active_portals_enabled`. This setting is only for a preview feature. When set to `true`, it allows multiple portals to be open at the same time, with their execution interleaved with each other. In other words, these portals can be paused. The underlying statement for a pausable portal must be a read-only `SELECT` query without sub-queries or post-queries, and such a portal is always executed with a local plan.
…e persistence This commit is part of the implementation of multiple active portals. In order to execute portals interleavingly, certain resources need to be persisted and their clean-up must be delayed until the portal is closed. Additionally, these resources don't need to be re-setup when the portal is re-executed. To achieve this, we store the cleanup steps in the `cleanup` function stacks in `portalPauseInfo`, and they are called when any of the following events occur: 1. SQL transaction is committed or rolled back 2. Connection executor is closed 3. An error is encountered when executing the portal 4. The portal is explicited closed by the user The cleanup functions should be called in the original order of a normal portal. Since a portal's execution follows the `execPortal() -> execStmtInOpenState() -> dispatchToExecutionEngine() -> flow.Run()` function flow, we categorize the cleanup functions into 4 "layers", which are stored accordingly in `PreparedPortal.pauseInfo`. The cleanup is always LIFO, following the - resumableFlow.cleanup - dispatchToExecutionEngine.cleanup - execStmtInOpenState.cleanup - exhaustPortal.cleanup` order. Additionally, if an error occurs in any layer, we clean up the current and proceeding layers. For example, if an error occurs in `execStmtInOpenState()`, we perform `resumableFlow.cleanup` and `dispatchToExecutionEngine.cleanup` (proceeding) and then `execStmtInOpenState.cleanup` (current) before returning the error to `execPortal()`, where `exhaustPortal.cleanup` will eventually be called. This is to maintain the previous clean-up process for portals as much as possible. We also pass the `PreparedPortal` as a reference to the planner in `execStmtInOpenState()`,so that the portal's flow can be set and reused. Release note: None
When executing or cleaning up a pausable portal, we may encounter an error and need to run the corresponding clean-up steps, where we need to check the latest `retErr` and `retPayload` rather than the ones evaluated when creating the cleanup functions. To address this, we use portal.pauseInfo.retErr and .retPayload to record the latest error and payload. They need to be updated for each execution. Specifically, 1. If the error happens during portal execution, we ensure `portal.pauseInfo` records the error by adding the following code to the main body of `execStmtInOpenState()`: ``` go defer func() { updateRetErrAndPayload(retErr, retPayload) }() ``` Note this defer should always happen _after_ the defer of running the cleanups. 2. If the error occurs during a certain cleanup step for the pausable portal, we ensure that cleanup steps after it can see the error by always having `updateRetErrAndPayload(retErr, retPayload)` run at the end of each cleanup step. Release note: None
This commit adds several restrictions to pausable portals to ensure that they work properly with the current changes to the consumer-receiver model. Specifically, pausable portals must meet the following criteria: 1. Not be internal queries 2. Be read-only queries 3. Not contain sub-queries or post-queries 4. Only use local plans These restrictions are necessary because the current changes to the consumer-receiver model only consider the local push-based case. Release note: None
When resuming a portal, we always reset the planner. However we still need the planner to respect the outer txn's situation, as we did in cockroachdb#98120. Release note: None
Release note: None
We now only support multiple active portals with local plan, so explicitly disable it for this test for now. Release note: None
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
rafiss
approved these changes
Apr 10, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for getting it through!
Great work on getting this in! 🎉 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 8/8 commits from #99663
This PR is to add limited support for multiple active portals. Now portals satisfying all following restrictions can be paused and resumed (i.e., with other queries interleaving it):
And such a portal will only have the statement executed with a non-distributed plan.
This feature is gated by a session variable
multiple_active_portals_enabled
. When it's settrue
, all portals that satisfy the restrictions above will automatically become "pausable" when being created via the pgwireBind
stmt.The core idea of this implementation is
switchToAnotherPortal
status to the result-consumption state machine. When we receive anExecPortal
message for a different portal, we simply return the control to the connExecutor. (sql: addswitchToAnotherPortal
signal for result consumer #99052)flow
queryID
span
andinstrumentationHelper
for the portal, and reuse it when we re-execute a portal. This is to ensure we continue the fetching rather than starting all over. (sql: enable resumption of a flow for pausable portals #99173)Note that we kept the implementation of the original "un-pausable" portal, as we'd like to limit this new functionality only to a small set of statements. Eventually some of them should be replaced (e.g. the limitedCommandResult's lifecycle) with the new code.
Also, we don't support distributed plan yet, as it involves much more complicated changes. See
Start with an entirely local plan
section in the design doc. Support for this will come as a follow-up.Epic: CRDB-17622
Release note (sql change): initial support for multiple active portals. Now with session variable
multiple_active_portals_enabled
set to true, portals satisfying all following restrictions can be executed in an interleaving manner: 1. Not an internal query; 2. Read-only query; 3. No sub-queries or post-queries. And such a portal will only have the statement executed with an entirely local plan.Release justification: this is the implementation of an important feature